13 research outputs found
Free-text Keystroke Authentication using Transformers: A Comparative Study of Architectures and Loss Functions
Keystroke biometrics is a promising approach for user identification and
verification, leveraging the unique patterns in individuals' typing behavior.
In this paper, we propose a Transformer-based network that employs
self-attention to extract informative features from keystroke sequences,
surpassing the performance of traditional Recurrent Neural Networks. We explore
two distinct architectures, namely bi-encoder and cross-encoder, and compare
their effectiveness in keystroke authentication. Furthermore, we investigate
different loss functions, including triplet, batch-all triplet, and WDCL loss,
along with various distance metrics such as Euclidean, Manhattan, and cosine
distances. These experiments allow us to optimize the training process and
enhance the performance of our model. To evaluate our proposed model, we employ
the Aalto desktop keystroke dataset. The results demonstrate that the
bi-encoder architecture with batch-all triplet loss and cosine distance
achieves the best performance, yielding an exceptional Equal Error Rate of
0.0186%. Furthermore, alternative algorithms for calculating similarity scores
are explored to enhance accuracy. Notably, the utilization of a one-class
Support Vector Machine reduces the Equal Error Rate to an impressive 0.0163%.
The outcomes of this study indicate that our model surpasses the previous
state-of-the-art in free-text keystroke authentication. These findings
contribute to advancing the field of keystroke authentication and offer
practical implications for secure user verification systems
Likelihood-Maximizing-Based Multiband Spectral Subtraction for Robust Speech Recognition
Automatic speech recognition performance degrades significantly when speech is affected by environmental noise. Nowadays, the major challenge is to achieve good robustness in adverse noisy conditions so that automatic speech recognizers can be used in real situations. Spectral subtraction (SS) is a well-known and effective approach; it was originally designed for improving the quality of speech signal judged by human listeners. SS techniques usually improve the quality and intelligibility of speech signal while speech recognition systems need compensation techniques to reduce mismatch between noisy speech features and clean trained acoustic model. Nevertheless, correlation can be expected between speech quality improvement and the increase in recognition accuracy. This paper proposes a novel approach for solving this problem by considering SS and the speech recognizer not as two independent entities cascaded together, but rather as two interconnected components of a single system, sharing the common goal of improved speech recognition accuracy. This will incorporate important information of the statistical models of the recognition engine as a feedback for tuning SS parameters. By using this architecture, we overcome the drawbacks of previously proposed methods and achieve better recognition accuracy. Experimental evaluations show that the proposed method can achieve significant improvement of recognition rates across a wide range of signal to noise ratios
Self-Supervised Representation Learning for Online Handwriting Text Classification
Self-supervised learning offers an efficient way of extracting rich
representations from various types of unlabeled data while avoiding the cost of
annotating large-scale datasets. This is achievable by designing a pretext task
to form pseudo labels with respect to the modality and domain of the data.
Given the evolving applications of online handwritten texts, in this study, we
propose the novel Part of Stroke Masking (POSM) as a pretext task for
pretraining models to extract informative representations from the online
handwriting of individuals in English and Chinese languages, along with two
suggested pipelines for fine-tuning the pretrained models. To evaluate the
quality of the extracted representations, we use both intrinsic and extrinsic
evaluation methods. The pretrained models are fine-tuned to achieve
state-of-the-art results in tasks such as writer identification, gender
classification, and handedness classification, also highlighting the
superiority of utilizing the pretrained models over the models trained from
scratch
Continuous Speech Recognition of Kazakh Language
This article describes the methods of creating a system of recognizing the continuous speech of Kazakh language. Studies on recognition of Kazakh speech in comparison with other languages began relatively recently, that is after obtaining independence of the country, and belongs to low resource languages. A large amount of data is required to create a reliable system and evaluate it accurately. A database has been created for the Kazakh language, consisting of a speech signal and corresponding transcriptions. The continuous speech has been composed of 200 speakers of different genders and ages, and the pronunciation vocabulary of the selected language. Traditional models and deep neural networks have been used to train the system. As a result, a word error rate (WER) of 30.01% has been obtained
Large Vocabulary Children’s Speech Recognition with DNN-HMM and SGMM Acoustic Modeling
In this paper, large vocabulary children’s speech recognition is investigated by using the Deep Neural Network - Hidden Markov Model (DNN-HMM) hybrid and the Subspace Gaussian Mixture Model (SGMM) acoustic modeling approach. In the investigated scenario training data is limited to about 7 hours of speech from children in the age range 7-13 and testing data consists in read clean speech from children in the same age range. To tackle inter-speaker acoustic variability, speaker adaptive training, based on feature space maximum likelihood linear regression, as well as vocal tract length normalization are adopted. Experimental results show that with both DNNHMM and SGMM systems very good recognition results can be achieved although best results are obtained with the DNNHMM system